Relaxing the WDO Assumption in Blind Extraction of Speakers from Speech Mixtures
نویسندگان
چکیده
The time-frequency masking approach in blind speech extraction consists of two main steps: feature clustering in a space spanned over delay-time and attenuation rate, and spectrogram masking in order to reconstruct the sources. Usually a binary mask is generated under the strong W-disjoint orthogonal (WDO) assumption (disjoint orthogonal representations in the frequency domain). In practice, this assumption is most often violated leading to weak quality of reconstructed sources. In this paper we propose the WDO to be relaxed by allowing some frequency bins to be shared by both sources. As we detect instantaneous fundamental frequencies the mask creation is supported by exploring a harmonic structure of speech. The proposed method is proved to be effective and reliable in experiments with both simulated and real acquired mixtures. Keywords—blind source extraction, harmonic frequencies, histogram clustering, spectrogram analysis, speech reconstruction, time-frequency masking, W-disjoint orthogonal.
منابع مشابه
Mokslas – Lietuvos Ateitis
We are developing two crucial improvements on the time-frequency masking approach to the blind speech separation of underdetermined mixtures when processing anechoic and echoic mixtures. First, the proposed method copes with the usually large amount of delay estimation error that appears in a low frequency band. This step generates a restrictive mask for phase delays on the basis of local and g...
متن کاملPhase Aliasing Correction For Robust Blind Source Separation Using DUET
Degenerate Unmixing Estimation Technique (DUET) is a technique for blind source separation (BSS). Unlike the ICA based BSS techniques, DUET is a time-frequency scheme that relies on the socalled W-disjoint orthogonality (WDO) property of the source signals, which states that the windowed Fourier transforms of different source signals have statistically disjoint supports. In addition to being co...
متن کاملA Stochastic Speech Model Supporting W-Disjoint Orthogonality
In previous work, we have successfully used an ideal joint sparseness assumption: W-Disjoint Orthogonality (WDO). This assumption, that the time-frequency representations of the sources have disjoint support, is satisfied in an approximate sense by many signals of practical interest, including speech. Here we discuss results derived from a stochastic model of speech signals that justify the WDO...
متن کاملBlind speech separation of moving speakers in real reverberant environments
In this paper we present a new on-line Blind Signal Separation method capable to separate convolutive speech signals of moving speakers in highly reverberant rooms. The separation network used is a recurrent network which performs separation of convolutive speech mixtures in the time domain, without any prior knowledge of the propagation media, based on the Maximum Likelihood Estimation (MLE) p...
متن کاملAdaptive Blind Separation of Speech Signals Cocktail Party Problem
In this paper we present an on line adaptive scheme for blind separation of speech signals from their convolutive mixtures This prob lem is often referred as cocktail party problem When multiple speakers speak simultaneously in tele conferencing studio we need to separate out each speaker from their mixtures If mix tures are assumed as instantaneous mixtures then it becomes standard blind sourc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010